1,435 research outputs found

    Cactus: Issues for Sustainable Simulation Software

    Full text link
    The Cactus Framework is an open-source, modular, portable programming environment for the collaborative development and deployment of scientific applications using high-performance computing. Its roots reach back to 1996 at the National Center for Supercomputer Applications and the Albert Einstein Institute in Germany, where its development jumpstarted. Since then, the Cactus framework has witnessed major changes in hardware infrastructure as well as its own community. This paper describes its endurance through these past changes and, drawing upon lessons from its past, also discusses futureComment: submitted to the Workshop on Sustainable Software for Science: Practice and Experiences 201

    Attribute Elicitation: Implications in the Research Context1

    Get PDF
    Three different methods of attribute elicitation for two different paper-based products were compared in this study. The three methods used were free elicitation (FE), hierarchical dichotomization (HD), and Kelly's repertory grid (RG). The two paper-based products used in this study were bathroom tissue and paper towels. The methods were compared by abstraction, efficiency in data collection, convergent validity, and respondents' reaction to the task. The results from this comparison indicated that the level of abstraction did not significantly differ between methods or products. However, a rank order analysis revealed that a substantial difference existed with 18 to 20% of the attributes being rated significantly different between the elicitation methods for paper towels and bathroom tissue, respectively. Convergent validity was exhibited between all the methods, although was found to be highest between HD and RG. These findings suggest that all three elicitation methods elicit very similar information from the consumers' knowledge base. The efficiency in data collection revealed that for both products FE took significantly less time to complete the task, as well as to elicit the individual attributes. Furthermore, HD was identified as being the least efficient of the methods for either product. For the comparison of the reaction to task, FE was found to be the least difficult of the three methods and also allowed the respondents to more freely express their opinion

    Shared memory parallelism in Modern C++ and HPX

    Full text link
    Parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to correctness. Over the years, numerous solutions have arisen, many of them requiring new programming languages, extensions to programming languages, or the addition of pragmas. Support for these various tools and extensions is available to a varying degree. In recent years, the C++ standards committee has worked to refine the language features and libraries needed to support parallel programming on a single computational node. Eventually, all major vendors and compilers will provide robust and performant implementations of these standards. Until then, the HPX library and runtime provides cutting edge implementations of the standards, as well as proposed standards and extensions. Because of these advances, it is now possible to write high performance parallel code without custom extensions to C++. We provide an overview of modern parallel programming in C++, describing the language and library features, and providing brief examples of how to use them

    A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems

    Full text link
    Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge (a library based framework for heterogeneous multi-core systems), Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a new programming language for general purpose computation on the GPU) and CUDA-lite (an enhancement to CUDA that transforms code based on annotations). In addition, efforts are underway to improve compiler tools for automatic parallelization and optimization of affine loop nests for GPUs and for automatic translation of OpenMP parallelized codes to CUDA. In this paper we present an alternative approach: a new computational framework for the development of massively data parallel scientific codes applications suitable for use on such petascale/exascale hybrid systems built upon the highly scalable Cactus framework. As the first non-trivial demonstration of its usefulness, we successfully developed a new 3D CFD code that achieves improved performance.Comment: Parallel Computing 2011 (ParCo2011), 30 August -- 2 September 2011, Ghent, Belgiu

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    Analysis of Weather Data Collected From Two Locations in a Small Urban Community

    Get PDF
    The heat island effect is a well known feature in the microclimate of urban areas, and is considered to be the difference between the urban area and its surroundings. While this study only employs two instruments, the authors are not aware of any studies which examine the differences in temperature between an instrument inside a town the size of Sedalia and its surroundings by collecting hourly information. We attempt to infer here the impact of Sedalia, Missouri, the State Fair Community College campus, and the state fairgrounds on the temperature patterns for a small region of west-central Missouri. The two stations, one on the grounds of State Fair Community College and the other at the Sedalia Airport were used. Temperature, precipitation, cloudiness, and wind information were gathered hourly between 1 February and 31 March, 2005. The weather station at the regional airport was located 11 km (7 miles) northeast of the campus instrument. Our results indicate that the city has no discernable impact on the distribution of monthly precipitation totals. We found a distinct difference between the local surface temperatures as recorded by each instrument. For the Sedalia area, the temperature differences between the town center and the outside location were approximately 2 - 6oF (1.0 - 3.3o C) warmer, typically, than the surrounding environment, as inferred by these instruments. This difference was as much as 11o F (6oC) when comparing hourly temperature information. Additionally, the difference was larger for clear days and days during which there was little wind

    Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

    Full text link
    Many scientific high performance codes that simulate e.g. black holes, coastal waves, climate and weather, etc. rely on block-structured meshes and use finite differencing methods to iteratively solve the appropriate systems of differential equations. In this paper we investigate implementations of an extremely simple simulation of this type using various programming systems and languages. We focus on a shared memory, parallelized algorithm that simulates a 1D heat diffusion using asynchronous queues for the ghost zone exchange. We discuss the advantages of the various platforms and explore the performance of this model code on different computing architectures: Intel, AMD, and ARM64FX. As a result, Python was the slowest of the set we compared. Java, Go, Swift, and Julia were the intermediate performers. The higher performing platforms were C++, Rust, Chapel, Charm++, and HPX

    Asynchronous Execution of Python Code on Task Based Runtime Systems

    Get PDF
    Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards
    • …
    corecore